Image by Fauxels from Pexels
The American Association of University Professors (AAUP) is a non-profit membership association of faculty and other academic professionals. This report compiled by the AAUP shows trends in instructional staff employees between 1975 and 2011. The report begins with the following data visualisation.
In this lab you will discuss in your groups what makes a good data visualisation and create a better visualisation for the above data.
Please find your team members that you formed in last week’s workshop. In today’s workshop you won’t necessarily be asked to work in pairs, but you are welcome to do so if you find it’s useful for you. You are definitely encouraged to discuss the questions with the rest of your team, but feel free to work on individual repositories if you prefer.
In the following: - if you want to keep working in pairs or teams, create a shared repository and invite your team members as collaborators (see instructions from previous workshops); - if you prefer to work individually, then each of you should create your own repository by cloning the template as described below.
Log onto GitHub and create a new repository by cloning today’s lab template project. To remind you of the step:
lab-06, and click on Begin import.Open RStudio and create a new version control project using the GitHub repository you have just made. To remind you of the steps:
Open the R Markdown document lab-06.Rmd and change the author to your name. Knit the document and make sure that it complies correctly without any errors.
✅ ⬆️ Commit and push your changes to GitHub with an appropriate commit message. Remember do good version control practices by periodically keeping your repository on GitHub up-to-date by staging and committing any substantial changes, and then push them to GitHub.
Look at the following data visualisations. Have a discussion with your team members at what might be problematic with the images. Do any of the visualisations have a problem with the 4 respects – people, data, mathematics and computer.
In your groups, take it in turn to work collaboratively in answering the following exercises.
Remember to regularly 🧶 knit, ✅ commit and ⬆️ push your work to your shared repository on GitHub. If you are faced with a merge conflict, then carefully follow the above instructions to reconcile the conflict before pushing your changes. If you come across an issue that you are unsure how to resolve, then please ask a tutor for assistance.
For the following exercises, you will be needing to use some of the data wrangling functions from the tidyverse package and the data visualisation code from the ggplot2 package. Ensure that you have the following two lines of code at the top of lab-03.Rmd to make the commands available to you.
Let’s start by loading the data from the AAUP that was used to create the data visualisation shown at the beginning of this worksheet.
While you work on these exercises, aim to create a merge conflict so that members 2, 4 and 6 can also practice resolving one. For example in pair 1, member 1 can make a minor edit in the answer box (like adding placeholder text), then commit and push this change to the repository. At the same time, member 2 should work on the actual answer for the exercise (without pulling the recent changes). When member 2 tries to push, they’ll encounter an error due to a merge conflict. Follow the instructions provided above to resolve the conflict.
View the data. Discuss as a team the following questions and write down your answer.
staff data wide or long?When creating a data visualisation, it is generally preferable to have the data set in a long format. That is to say, each row should relate to a unique case/observation.
If the data set is in a wide format then we need to reshape its structure by pivoting from wide to long using pivot_longer(). The animation below show how this function works, as well as its counterpart pivot_wider().
Quick reminder: the function has the following arguments:
data as usual.cols, specifies the columns to pivot into longer format.names_to, is the name of the column where column names of pivoted variables go (character string).values_to is the name of the column where data in pivoted variables go (character string).Fill in the blanks in the following code chunk to pivot the staff data longer and save it as a new data frame called staff_long.
Inspect staff_long. How many rows does it have? Does this correspond to your answer from Exercise 1?
We will begin by plotting instructional staff employment trends as a dot plot. Copy the following code that creates a dot plot of percentage on the y-axis against year on the x-axis, with the dots coloured based on the faculty_type. Ensure that you understand what each part of the code is doing.
Perhaps the trend over time can be better visualised using lines rather than dots. Edit the above code to use the geom_line() command.
What is wrong with the graph? Have a look at the data and the dot plot for clues as to what might be wrong before progressing to the next exercise. (You do not need to say how to fix it here—that is the next question!)
In the dot plot from exercise 3, notice that the scaling along the x-axis is not consistent. The physical distance between each of the years are the same, but numerically there are 14 years between the first two cases and 2 years between the last two!
The reason for this is because the year variable in staff_long is a "character" variable, not a numerical variable.
Complete the following code to edit the variable type of year from character to numerical.
Now create the line plot described in exercise 4 to illustrate how the faculty proportions have changed over time.
Improve the line plot from the previous exercise by fixing up its labels (title, axis labels, and legend label) as well as any other components you think could benefit from improvement.
Suppose the objective of this plot was to show that the proportion of part-time faculty have gone up over time compared to other instructional staff types. What changes would you propose making to this plot to tell this story? Write down your idea(s). The more precise you are, the easier the next step will be. Get creative, and think about how you can modify the dataset to give you new/different variables to work with.
Implement at least one of these ideas you came up with in the previous exercise. You should produce an improved data visualisation and accompany your visualisation with a brief paragraph describing the choices you made in your improvement, specifically discussing what you didn’t like in the original plot and why, and how you addressed them in the visualisation you created.
At the end of the lab, make sure you 🧶 knit, ✅ commit and ⬆️ push any remaining changes to your repository on Github.
If you worked with a shared repository, make sure to resolve any merge conflicts and then ⬇️ Pull the latest changes so that your personal copy is up-to-date. If you were not the owner of the shared repository, then please follow the instructions below to create your own copy of the repository for future reference:
If you were the owner of the shared repository, at the end of the workshop, you want to ensure that only you can make further changes to the shared repository. To do this, you will need to remove the collaboration permissions of your team members. To do this: